External String Sorting: Faster and Cache-Oblivious
نویسندگان
چکیده
We give a randomized algorithm for sorting strings in external memory. For K binary strings comprising N words in total, our algorithm finds the sorted order and the longest common prefix sequence of the strings using O( B logM/B( K M ) log( K ) + N B ) I/Os. This bound is never worse than O( B logM/B( K M ) log logM/B( K M ) + N B ) I/Os, and improves on the (deterministic) algorithm of Arge et al. (On sorting strings in external memory, STOC ’97). The error probability of the algorithm can be chosen as O(N−c) for any positive constant c. The algorithm even works in the cache-oblivious model under the tall cache assumption, i.e,, assuming M > B for some > 0. An implication of our result is improved construction algorithms for external memory string dictionaries.
منابع مشابه
Simple Linear Work Suffix Array Construction
A suffix array represents the suffixes of a string in sorted order. Being a simpler and more compact alternative to suffix trees, it is an important tool for full text indexing and other string processing tasks. We introduce the skew algorithm for suffix array construction over integer alphabets that can be implemented to run in linear time using integer sorting as its only nontrivial subroutin...
متن کاملCache-Oblivious and Data-Oblivious Sorting and Applications
Although external-memory sorting has been a classical algorithms abstraction and has been heavily studied in the literature, perhaps somewhat surprisingly, when data-obliviousness is a requirement, even very rudimentary questions remain open. Prior to our work, it is not even known how to construct a comparison-based, external-memory oblivious sorting algorithm that is optimal in IO-cost. We ma...
متن کاملCache-Adaptive Algorithms
We introduce the cache-adaptive model, which generalizes the external-memory model to apply to environments in which the amount of memory available to an algorithm can fluctuate. The cache-adaptive model applies to operating systems, databases, and other systems where the allocation of memory to processes changes over time. We prove that if an optimal cache-oblivious algorithm has a particular ...
متن کاملCache-Oblivious Index for Approximate String Matching
This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P , we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-mem...
متن کاملFunnel Heap - A Cache Oblivious Priority Queue
The cache oblivious model of computation is a two-level memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multi-level memory model. Arge et al. recently presented the first optimal cache oblivious priority queue, and demonstr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006